Three Main Goals of this File
Produce Cleaner looking code.
Identify the amount of clusters there are
Identify the top genes
expressed in each of the clusters
Save things as RDS file so I dont have to rerun the whole code
options(future.globals.maxSize = 74 * 1024^3) # 55 GB
getOption("future.globals.maxSize") #59055800320## [1] 79456894976
Based off this I can see that
SO1-> control SO2 -> low_salt SO3 -> low_salt SO4 -> control
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 11426
## Number of edges: 376311
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8730
## Number of communities: 6
## Elapsed time: 0 seconds
## Calculating cluster 0
## For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
## (default method for FindMarkers) please install the presto package
## --------------------------------------------
## install.packages('devtools')
## devtools::install_github('immunogenomics/presto')
## --------------------------------------------
## After installation of presto, Seurat will automatically use the more
## efficient implementation (no further action necessary).
## This message will be shown once per session
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
SO5m %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 5) %>%
ungroup() -> top10
DoHeatmap(SO5, features = top10$gene) + NoLegend()## Warning in DoHeatmap(SO5, features = top10$gene): The following features were
## omitted as they were not found in the scale.data slot for the SCT assay: Ifi47,
## Rpl3-ps1
Observation : MCUB seems to be the highly defined gene in low salt
## Cluster 3
# Subset Cluster 0, 1, 3
# SO6 checking another cluster
SO6<- subset(SO5, idents = c("0","1","3"))
SO6 <- FindNeighbors(SO6, dims = 1:30, verbose = F)
SO6 <- FindClusters(SO6, resolution = 0.1)## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 8805
## Number of edges: 290708
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9087
## Number of communities: 2
## Elapsed time: 0 seconds
## Calculating cluster 0
## Calculating cluster 1
My guess is that these are the same, As you go from control to low_salt the cells start to express different genes. How can I test this?
I think the next step after I figure out something with these clusters is to figure out what each of these top genes do, the functions, and purpose of them.
Next Steps: Take a Few steps back and identify S100g Cluster
Multidimensional Dotplot (top genes that are expressed)
Possibly Filter out Rp genes for maybe mt too
Annot objects
Read Janos’ article
Next less : DEG list pathway analysis volcano plot
# Check S100g so i Have to make this into more clusters.
SO5_new <- RunUMAP(SO5, dims = 1:30, verbose = F)## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
SO5_new <- FindNeighbors(SO5, dims = 1:30, verbose = F)
SO5_new <- FindClusters(SO5, resolution = .4)## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 11426
## Number of edges: 376311
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8414
## Number of communities: 9
## Elapsed time: 229 seconds
# I thought 5 4 2 are its own cluster
# So in this it would be cluster
# 8=5 , 4=5 , 2=2 ,
# Cluster 6 should be the s100g cluster
SO5_m <- FindAllMarkers(SO5_new, only.pos = TRUE)## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
SO5_m %>%
group_by(cluster) %>%
dplyr::filter(avg_log2FC > 1) %>%
slice_head(n = 5) %>%
ungroup() -> top10
DoHeatmap(SO5_new, features = top10$gene) + NoLegend()## Warning in DoHeatmap(SO5_new, features = top10$gene): The following features
## were omitted as they were not found in the scale.data slot for the SCT assay:
## Ifi47
Unknown from this is now still 0, 1,3, 4,
markers.to.plot1 <- c(
"S100g", #
"Atf3", #
"Egr1", #
"Fos", #
"Jun", #
"Junb", #
"Pappa2", #
"Cxcl10", #
"Cldn19", #
"Krt7", #
"Egf", #
"Aard",
"Ptger3",
"Leng9",
"Ckb",
"Mcub",
"Fabp3",
"Ccn1",
"Foxq1",
"Cxcl12",
"Vash2",
"Pamr1",
"Vegfa",
"Nov"
)
DotPlot(SO5_new,
features = markers.to.plot1,
dot.scale = 8,
dot.min = 0,
scale.max = 100,
scale.min = 0,
col.min = -2.5,
col.max = 2.5)+
coord_flip()## Warning: The following requested variables were not found: Nov
# I feel like cluster 8 is a contaminate because the it shares a lot of similar genes with my "assumed clusters"Cluster 0 = Aard, Mcub
Cluster 2 = Egf, Krt7, Cldn19
Cluster 4 = Ptger3
Cluster 5 = Junb, Jun, Fos, Egr1, Atf3
Cluster 6 = S100g
Cluster 7 = Cxcl10
Cluster 3 = could be Leng9 but i feel thats so insignificant.
Cluster 1 = No idea. Missing genes for 1 and 3
SO7 <- subset(SO5_new, idents = "8", invert = TRUE)
SO7 <- RunUMAP(SO7, dims = 1:30, verbose = F)
SO7 <- FindNeighbors(SO7, dims = 1:30, verbose = F)
SO7 <- FindClusters(SO7, resolution = .4)## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 11404
## Number of edges: 376163
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8419
## Number of communities: 8
## Elapsed time: 0 seconds
DotPlot(SO7,
features = markers.to.plot1,
dot.scale = 8,
dot.min = 0,
scale.max = 100,
scale.min = 0,
col.min = -2.5,
col.max = 2.5)+
coord_flip()## Warning: The following requested variables were not found: Nov
My assumptions have changed now that I have viewed violin plots.
I think the main clusters are 5, 6, 7, 2
I think 0, 1, 3, 4 are its own thing but just undergoing change?
Specifically I think 1 and 0 are the same and 4 and 3 maybe?
I think cluster 1 turns into 0 in low salt conditions and cluster 4 turns into 3. Both 1 and 3 have Leng9 which is something thats expressed during evolutionary something like that.
According to the Multidimensional dotplot 0 and 1 also look pretty identical minus the expression of a few specific genes but that may jsut be the product of change.
Not sure how to explain 4 and 3 though. # Analyzing
SO1 and SO4 = Control SO2 and SO3 = low_salt
Im guessing, the two Control samples were slightly different. One expressing more in each. They then used induced low salt into each of these two controls types and got SO2 and SO3 but they’re different because of control.
I think it might be a good idea to analyze maybe just SO1 and SO2 to see the change of 4 turning into 3 and analyzing SO4 and SO3 separate to see cluster 1 changing to 0.